Scalable robust graph embedding with Spark
نویسندگان
چکیده
Graph embedding aims at learning a vector-based representation of vertices that incorporates the structure graph. This then enables inference graph properties. Existing techniques, however, do not scale well to large graphs. While several techniques using compute clusters have been proposed, they require continuous communication between nodes and cannot handle node failure. We therefore propose framework for scalable robust based on MapReduce model, which can distribute any existing technique. Our method splits into subgraphs learn their embeddings in isolation subsequently reconciles spaces derived subgraphs. realize this idea through novel distributed decomposition algorithm. In addition, we show how implement our Spark enable efficient effective embeddings. Experimental results illustrate approach scales well, while largely maintaining quality.
منابع مشابه
MILE: A Multi-Level Framework for Scalable Graph Embedding
Recently there has been a surge of interest in designing graph embedding methods. Few, if any, can scale to a large-sized graph with millions of nodes due to both computational complexity and memory requirements. In this paper, we relax this limitation by introducing the MultI-Level Embedding (MILE) framework – a generic methodology allowing contemporary graph embedding methods to scale to larg...
متن کاملGraph Embedding with Constraints
Recently graph based dimensionality reduction has received a lot of interests in many fields of information processing. Central to it is a graph structure which models the geometrical and discriminant structure of the data manifold. When label information is available, it is usually incorporated into the graph structure by modifying the weights between data points. In this paper, we propose a n...
متن کاملBalanced Graph Partitioning with Apache Spark
A significant part of the data produced every day by online services is structured as a graph. Therefore, there is the need for efficient processing and analysis solutions for large scale graphs. Among the others, the balanced graph partitioning is a well known NP-complete problem with a wide range of applications. Several solutions have been proposed so far, however most of the existing state-...
متن کاملGraph Clustering with Dynamic Embedding
Graph clustering (or community detection) has long drawn enormous aention from the research on web mining and information networks. Recent literature on this topic has reached a consensus that node contents and link structures should be integrated for reliable graph clustering, especially in an unsupervised setting. However, existing methods based on shallow models oen suer from content nois...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the VLDB Endowment
سال: 2021
ISSN: ['2150-8097']
DOI: https://doi.org/10.14778/3503585.3503599